转移学习是开发性能RL代理的越来越普遍的方法。但是,尚不清楚如何定义源和目标任务之间的关系,以及这种关系如何有助于成功转移。我们提出了一种称为两个MDP或SS2的结构相似性的算法,该算法基于先前开发的双仿真指标来计算两个有限MDP的状态的状态相似性度量,并表明该量度满足距离度量的属性。然后,通过GRIDWORLD导航任务的经验结果,我们提供了证据表明,距离度量可用于改善Q学习剂的转移性能,而不是先前的实现。
translated by 谷歌翻译
The deployment of machine learning models in safety-critical applications comes with the expectation that such models will perform well over a range of contexts (e.g., a vision model for classifying street signs should work in rural, city, and highway settings under varying lighting/weather conditions). However, these one-size-fits-all models are typically optimized for average case performance, encouraging them to achieve high performance in nominal conditions but exposing them to unexpected behavior in challenging or rare contexts. To address this concern, we develop a new method for training context-dependent models. We extend Bridge-Mode Connectivity (BMC) (Garipov et al., 2018) to train an infinite ensemble of models over a continuous measure of context such that we can sample model parameters specifically tuned to the corresponding evaluation context. We explore the definition of context in image classification tasks through multiple lenses including changes in the risk profile, long-tail image statistics/appearance, and context-dependent distribution shift. We develop novel extensions of the BMC optimization for each of these cases and our experiments demonstrate that model performance can be successfully tuned to context in each scenario.
translated by 谷歌翻译
创建能够证明终身学习的人工智能(AI)系统是一个基本挑战,并且已经提出了许多方法和指标来分析算法属性。但是,对于现有的终身学习指标,算法贡献被任务和场景结构混淆。为了减轻此问题,我们引入了一种算法 - 敏捷的可解释的替代模型方法,以估计终身学习算法的潜在特性。我们验证通过合成数据实验估算这些特性的方法。为了验证替代模型的结构,我们分析了来自流行的终身学习方法和基准的真实绩效数据,这些基线适用于终身分类和终身强化学习。
translated by 谷歌翻译
大多数深度加强学习(DRL)的方法试图一次解决单一任务。因此,大多数现有的研究基准组成包括具有普通接口,但在其感知特征,目标或奖励结构中重叠的单独游戏或套房。促进培训代理人的知识转移(例如,通过多任务和元学习),需要更多的环境套件,提供具有足够共同的可配置任务,以共同研究待研究。在本文中,我们提供了Meta Arcade,该工具可以轻松定义和配置共享公共视觉效果,状态空间,动作空间,游戏组件和评分机制的自定义2D街机游戏。元拱门与现有环境不同,因为任职性共性和可配置性都优先考虑:可以从公共元素构建整组游戏,并且这些元素可通过暴露参数调节。我们包括一套24个预定义的游戏,共同说明了该框架的可能性,并讨论如何为研究应用程序配置这些游戏。我们提供了几个实验,说明了可以使用Meta Arcade如何使用,包括预定义游戏的单项任务基准,以设定的时间表更改游戏参数的示例课程的方法,以及游戏之间的转移学习探索。
translated by 谷歌翻译
我们展示了一种带有Openai健身房界面的作物仿真环境,并应用现代深度加强学习(DRL)算法以优化产量。我们经验表明,DRL算法可用于发现新的政策和方法,以帮助优化作物产量,同时最小化水和肥料使用等约束因素。我们提出这种混合厂建模和数据驱动的方法,用于发现新策略的优化作物产量可能有助于满足越来越多的全球粮食需求,由于人口扩张和气候变化。
translated by 谷歌翻译
There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .
translated by 谷歌翻译
The purpose of this work was to tackle practical issues which arise when using a tendon-driven robotic manipulator with a long, passive, flexible proximal section in medical applications. A separable robot which overcomes difficulties in actuation and sterilization is introduced, in which the body containing the electronics is reusable and the remainder is disposable. A control input which resolves the redundancy in the kinematics and a physical interpretation of this redundancy are provided. The effect of a static change in the proximal section angle on bending angle error was explored under four testing conditions for a sinusoidal input. Bending angle error increased for increasing proximal section angle for all testing conditions with an average error reduction of 41.48% for retension, 4.28% for hysteresis, and 52.35% for re-tension + hysteresis compensation relative to the baseline case. Two major sources of error in tracking the bending angle were identified: time delay from hysteresis and DC offset from the proximal section angle. Examination of these error sources revealed that the simple hysteresis compensation was most effective for removing time delay and re-tension compensation for removing DC offset, which was the primary source of increasing error. The re-tension compensation was also tested for dynamic changes in the proximal section and reduced error in the final configuration of the tip by 89.14% relative to the baseline case.
translated by 谷歌翻译
Compliance in actuation has been exploited to generate highly dynamic maneuvers such as throwing that take advantage of the potential energy stored in joint springs. However, the energy storage and release could not be well-timed yet. On the contrary, for multi-link systems, the natural system dynamics might even work against the actual goal. With the introduction of variable stiffness actuators, this problem has been partially addressed. With a suitable optimal control strategy, the approximate decoupling of the motor from the link can be achieved to maximize the energy transfer into the distal link prior to launch. However, such continuous stiffness variation is complex and typically leads to oscillatory swing-up motions instead of clear launch sequences. To circumvent this issue, we investigate decoupling for speed maximization with a dedicated novel actuator concept denoted Bi-Stiffness Actuation. With this, it is possible to fully decouple the link from the joint mechanism by a switch-and-hold clutch and simultaneously keep the elastic energy stored. We show that with this novel paradigm, it is not only possible to reach the same optimal performance as with power-equivalent variable stiffness actuation, but even directly control the energy transfer timing. This is a major step forward compared to previous optimal control approaches, which rely on optimizing the full time-series control input.
translated by 谷歌翻译
The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
translated by 谷歌翻译
The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.
translated by 谷歌翻译